Recently, a unique AI capability assessment took place on the Minecraft platform, attracting a lot of attention. The old and new versions of Claude3.5Sonnet competed in building challenges, highlighting significant differences in their abilities, with the new version (tentatively called Sonnet3.6) performing especially well. This test, initiated by developer adi, has been humorously dubbed the only reliable assessment benchmark. Assessment researcher Aidan McLau believes this method precisely meets the current needs for AI evaluation and points out the relationship with aesthetic abilities.